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ABSTRACT 

Various reported measures of clustering in free 
recall are reviewed under categories of algebraic versus 
probabilistic approaches. Shortcomings in these measures are outlined 
and a new multi-dimensional measure is advanced which overcomes many 
of the deficiencies noted. (Author) 
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STATEMENT OF FOCUS 

The Wisconsin Research and Development Center for Cognitive Learning 
focuses on contributing to a better understanding of cognitive learning by 
children and youth and to the improvement of related educational practices. 
The strategy for research and development is comprehensive. It includes 
basic research to generate new knowledge about the conditions and processes 
of learning and about the processes of instruction, and the subsequent 
development of research- based instructional materials, many of which are 
designed for use by teachers and others for use by students. These materials 
are tested and refined in school settings. Throughout these operations 
behavioral scientists, curriculum experts, academic scholars, and school 
people interact, insuring that the results of Center activities are based 
soundly on knowledge of subject matter and cognitive learning and that they 
are applied to the improvement of educational practice. 

This working paper is from the Motivation and Individual Differences 
in Learning and Retention Project in Program 1, Conditions and Processes of 
Learning. General objectives of the Program are to generate knowledge about 
concept learning and cognitive skills, to synthesize existing knowledge and 
develop general taxonomies, models, or theories of cognitive learning, and 
to utilize the knowledge in the development of curriculum materials and 
procedures. Contributing to these Program objectives, this project has these 
objectives: to determine the developmental role of individual differences 
and motivation-attention in the learning and memory process and to ascertain 
at what age certain individual differences become important in learning 
and memory and at what age certain motivation- re tent ion relationships emerge; 



iii 



to develop a theory of individual differences and motivation in learning 
and memory; and to develop practical means, based on the knowledge generated 
by the research, as well as synthesized from other sources, to maximize 
the retention of verbal material. 
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ABSTRACT 



Various reported measures of clustering in free recall are reviewed 
under categories of algebraic versus probabilistic approaches. Short- 
comings in these measures are outlined and a new multi-dimensional 
measure is advanced which overcomes many of the deficiencies noted. 
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CLUSTERING MEASURES 



In a free-recall experiment, S^ is presented with a list of items 
which he then is instructed to recall. If the original items can be 
classed into mutually exclusive categories, it has been found that 
usually arranges these items by category when he recalls them. This is 
known as clustering. 

Historically, indices of clustering in free-recall experiments 
have taken what is herein referred to as the "algebraic" approach. 
That is, some index or ratio is derived from certain characteristics 
of the data such as repetitions of items from a given category, runs of 
item types, lengths of runs, etc. The ratio is obtained by comparing 
these numbers to some ideal such as maximum possible runs, maximum pairs, 
etc, A review of this approach and the various types of indices it has 
produced up to 1969 is presented by Shuell (1969) , 

In a paper which appeared after Shuell 's review, Dalrymple-Alford 

(1970) utilized the algebraic approach and derived the index C, which is: 

C ** (Repetitions - Minimum Repetition) 

(Maximum Repetition - Minimum Repetition) 

Dalrymple-Alford also gives equations for finding maximum and mirimum 

repetitions. 

More recently, Frankel and Cole (19 71) have arrived at what will 
be referred to in the present paper as the "probabilistic" approach. 
Using statistics of sequential lists, they calculate the mean and variance 
of runs (strings of items from the same category) expected in a given 
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list of recalled items. Using these statistics and the observed number 
of runs, a z score may be obtained whose probability of occurrence may 
then be read from a standard normal table. 

Alternatively, a chi-square approximation may be used which removes 
a good deal of the computation invoi/ed in obtaining 2. 

(0 - M)2 

ry R R 

„ , 

(1) M 

R 

The associated probability may then be found in a table. 

Whether algebraic or probabilistic, an index of clustering ought 
to be sensitive to differences in cluster length. An example will be 
given vMch ..hows that both the Dalrymple-Alford measure and the score 
are fallible in this respect. 

Given a presented list of 16 items, four of which fall into category 
a, four into category lb, four into £, and four into d^, might possibly 
recall five items. Three of these possible strings are: 

(1) a a b c d , 

(2) a a a b c > 

(3) a a a a b . 

For each of these the Dalrymple-Alford C index is 1.00, implying perfect 
clustering. Interesting results are also obtained by the probabilistic 
method: for (1), z « -1.22; for (2), 2 = -1,33; and for (3), 2 = -1,22, 
Thus, the latter method does not discriminate between the least clustered 
case and the most clustered (most clustered being defined as items from a 
category occurring together most frequently), while the former makes no 
distinction at all. 



The strong point of the probabilistic approach is that is yields 
a conditional probability; i.e., the probability that tlie observed amount 
of runs will occur given the items that have actually been recalled. 
The z score, however, does not take into account the structure of tlio^ 
original population of items. Thus, tlie obtained value of z for (3) 
docs not reflect the fact that S has clustered all the items from a. 

Jn developing a conditional probabilistic measure, then, the structure 
of the sampled-from list should be taken into account as well as the 
recalled list. One ideal measure would be 

Pr (observed recall list) = P(a|b) 
where (A) = recalling a given string of Items 
(H) = recalling a given set of items 
from a list whose composition has been defined by E. An example will 
make this clearer. Suppose S recalls a a a a b . We then wish to find 
the probability of this particular string given that four a's and one b 
have been recalled from a list containing four a's, four b's, four c.'s, 
and four d's. From the basic laws of conditional probability, 

, P(A) P(b|a) 

P(A B) = ' 



P(B) 

In this case P(b|a) = 1, since if we are given that a certain string hn; 
occurred, we know that a list composed of the items in that string must 
have occurred. Therefore, 

p(a|b) = ^^^^ 
P(B) 
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By the hypergeometric distribution, 



P(B) = i-1 

j j 

where 2 Ti = T and 2 ni « N 

i=l i=l 



P(A) is represented by the joint probability of recalling n items, 
and attaining n_ - b runs; e.g., for (3) this would be £ (number of items = 
5 and number of runs = 2). If we assume that items recalled are normally 
distributed as well as runs, we can calculate these probabilities after 
calculating mean and variance of items recalled and mean and variance of 
runs. These probabilities can then be multiplied to obtain P(A), which 
when divided by P(B) yields P(a|b), A string which has a low conditional 
probability may then be said to be "significantly" clustered. 

Another way of viewing clustering is to see it as a deviation of 
observed runs from the maximum 'possible number of runs, given that n 
items have been recalled. Maximum runs are then simply equal to n, 
and we have 

■ 2 
'^■j^j = (observed runs - n) 

n 

A low probability would then indicate significant clustering. 

2 

Another way is to use aX approximation in conjunction with the 
number of observed and expected repetitions in the recalled list. From 
Dalrymple-Alford (1970), 
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E(Reps) = \_ 
N 

in which n^^ is the number of items of type i^ in the recalled list. Then 

2 2 
^^ ^ O(Reps) ~ E(reps) 

E(reps) 

The process of clustering would seem to be dependent on at least 
three subprocesses — concentration within categories, grouping, and recall. 
Measures of clustering thus far proposed yield a result which might be 
arrived at in a variety of ways and which does not reflect performance 
on. these three processes. Since this is a problem intrinsic to any 
single-valued measure, an ideal clustering measure ought to result in 
more than one value which could then be collapsed into a single result 
if the investigator wished. 

The measures to be proposed have zero as their ideal cases. This runs 
counter to Frankel and Cole's (1971) contention that a cluster measure 
should increase as clustering increases. 

To measure concentration within categories , the variance of recalled 
proportions may be used. This is simply 

J 

^("j) ^ - JX ^ 
var = i-~^ 

J 

in which n^ = number of items recalled from category j 
J = number of categories 
X = mean proportion recalled 
The measure of concentration is thus 



V = 1 - var 

max var 

The variance is divided by the maximum possible variance in order to 
equalize the scales on all three measures. 

The measure of groupingness will simply be the mean number of terms 
between ffi Item and the next item in the string from the same category. 
Thus, for a b c b b c a , there are five terms between the two a^'s, one 
term between the first two b^'s, no terms between the second two b's, and 
two terms between the two £'s. Thus, our preliminary M (groupingness 
measure) » 2, since there are eight "between" terms and four pairs of 
similar terms. Note that the second b^ was counted twice (as the second 
member of the first pair and the first member of the second pair,) 
In order to keep our scales equivalent, we divide this by the maximum 
possible result, which can be shown to be J - 1 (where J = number of cate- 
gories) . M for this example, then, is .667, 

Our measure of recall is 

R = 1 - number recalled 
number in list 

Thus, this procedure yields three results which may be thought of 
as describing a point in three-space. Perfect clustering occurs at the 
origin of this space. If a single value is desired, it seems logical to 
use the distance from the obtained point to the origin— which is, of 

course, 

D = /v^ + + 

A single result is thus obtained whose components are easily ri^tr I ovcil . 
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